{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "special-institution",
   "metadata": {},
   "source": [
    "# Multiple Seasonal-Trend decomposition using LOESS (MSTL)\n",
    "\n",
    "This notebook illustrates the use of `MSTL` [1] to decompose a time series into a: trend component, multiple seasonal components, and a residual component. MSTL uses STL (Seasonal-Trend decomposition using LOESS) to iteratively extract seasonal components from a time series. The key inputs into `MSTL` are:\n",
    "\n",
    "* `periods` - The period of each seasonal component (e.g., for hourly data with daily and weekly seasonality we would have: `periods=(24, 24*7)`.\n",
    "* `windows` - The lengths of each seasonal smoother with respect to each period. If these are large then the seasonal component will show less variability over time. Must be odd. If `None` a set of default values determined by experiments in the original paper [1] are used.\n",
    "* `lmbda` - The lambda parameter for a Box-Cox transformation prior to decomposition. If `None` then no transformation is done. If `\"auto\"` then an appropriate value for lambda is automatically selected from the data.\n",
    "* `iterate` - Number of iterations to use to refine the seasonal component.\n",
    "* `stl_kwargs` - All the other parameters which can be passed to STL (e.g., `robust`, `seasonal_deg`, etc.). See [STL docs](https://www.statsmodels.org/dev/generated/statsmodels.tsa.seasonal.STL.html).\n",
    "\n",
    "[1] [K. Bandura, R.J. Hyndman, and C. Bergmeir (2021)\n",
    "    MSTL: A Seasonal-Trend Decomposition Algorithm for Time Series with Multiple\n",
    "    Seasonal Patterns. arXiv preprint arXiv:2107.13462.](https://arxiv.org/pdf/2107.13462.pdf)\n",
    "    \n",
    "Note there are some key differences in this implementation to [1](https://arxiv.org/pdf/2107.13462.pdf). Missing data must be handled outside of the `MSTL` class. The algorithm proposed in the paper handles a case when there is no seasonality. This implementation assumes that there is at least one seasonal component.\n",
    "\n",
    "First we import the required packages, prepare the graphics environment, and prepare the data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "multiple-september",
   "metadata": {},
   "outputs": [],
   "source": [
    "import datetime\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "from pandas.plotting import register_matplotlib_converters\n",
    "\n",
    "from statsmodels.tsa.seasonal import MSTL, DecomposeResult\n",
    "\n",
    "register_matplotlib_converters()\n",
    "sns.set_style(\"darkgrid\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "retired-fellowship",
   "metadata": {},
   "outputs": [],
   "source": [
    "plt.rc(\"figure\", figsize=(16, 12))\n",
    "plt.rc(\"font\", size=13)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "226736d1-5edf-475e-8cf6-3bb339ae39f9",
   "metadata": {},
   "source": [
    "## MSTL applied to a toy dataset "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "intensive-disorder",
   "metadata": {},
   "source": [
    "### Create a toy dataset with multiple seasonalities"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "difficult-phenomenon",
   "metadata": {},
   "source": [
    "We create a time series with hourly frequency that has a daily and weekly seasonality which follow a sine wave. We demonstrate a more real world example later in the notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "organized-apparatus",
   "metadata": {},
   "outputs": [],
   "source": [
    "t = np.arange(1, 1000)\n",
    "daily_seasonality = 5 * np.sin(2 * np.pi * t / 24)\n",
    "weekly_seasonality = 10 * np.sin(2 * np.pi * t / (24 * 7))\n",
    "trend = 0.0001 * t**2\n",
    "y = trend + daily_seasonality + weekly_seasonality + np.random.randn(len(t))\n",
    "ts = pd.date_range(start=\"2020-01-01\", freq=\"h\", periods=len(t))\n",
    "df = pd.DataFrame(data=y, index=ts, columns=[\"y\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fourth-jones",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "early-traveler",
   "metadata": {},
   "source": [
    "Let's plot the time series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "jewish-marijuana",
   "metadata": {},
   "outputs": [],
   "source": [
    "df[\"y\"].plot(figsize=[10, 5])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acknowledged-updating",
   "metadata": {},
   "source": [
    "### Decompose the toy dataset with MSTL"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "03912506-aee6-4094-89b3-2a41f2c8713d",
   "metadata": {},
   "source": [
    "Let's use MSTL to decompose the time series into a trend component, daily and weekly seasonal component, and residual component."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "assumed-foundation",
   "metadata": {},
   "outputs": [],
   "source": [
    "mstl = MSTL(df[\"y\"], periods=[24, 24 * 7])\n",
    "res = mstl.fit()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fifth-harvey",
   "metadata": {},
   "source": [
    "If the input is a pandas dataframe then the output for the seasonal component is a dataframe. The period for each component is reflect in the column names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "collective-apparel",
   "metadata": {},
   "outputs": [],
   "source": [
    "res.seasonal.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "first-gabriel",
   "metadata": {},
   "outputs": [],
   "source": [
    "ax = res.plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "successful-alaska",
   "metadata": {},
   "source": [
    "We see that the hourly and weekly seasonal components have been extracted."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sophisticated-dining",
   "metadata": {},
   "source": [
    "Any of the STL parameters other than `period` and `seasonal` (as they are set by `periods` and `windows` in `MSTL`) can also be set by passing arg:value pairs as a dictionary to `stl_kwargs` (we will show that in an example now).\n",
    "\n",
    "Here we show that we can still set the trend smoother of STL via `trend` and order of the polynomial for the seasonal fit via `seasonal_deg`. We will also explicitly set the `windows`, `seasonal_deg`, and `iterate` parameter explicitly. We will get a worse fit but this is just an example of how to pass these parameters to the `MSTL` class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "immune-azerbaijan",
   "metadata": {},
   "outputs": [],
   "source": [
    "mstl = MSTL(\n",
    "    df,\n",
    "    periods=[\n",
    "        24,\n",
    "        24 * 7,\n",
    "    ],  # The periods and windows must be the same length and will correspond to one another.\n",
    "    windows=[\n",
    "        101,\n",
    "        101,\n",
    "    ],  # Setting this large along with `seasonal_deg=0` will force the seasonality to be periodic.\n",
    "    iterate=3,\n",
    "    stl_kwargs={\n",
    "        \"trend\": 1001,  # Setting this large will force the trend to be smoother.\n",
    "        \"seasonal_deg\": 0,  # Means the seasonal smoother is fit with a moving average.\n",
    "    },\n",
    ")\n",
    "res = mstl.fit()\n",
    "ax = res.plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "floral-matrix",
   "metadata": {},
   "source": [
    "## MSTL applied to electricity demand dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "legislative-trout",
   "metadata": {},
   "source": [
    "### Prepare the data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "entertaining-nevada",
   "metadata": {},
   "source": [
    "We will use the Victoria electricity demand dataset found here: \n",
    "https://github.com/tidyverts/tsibbledata/tree/master/data-raw/vic_elec. This dataset is used in the [original MSTL paper [1]](https://arxiv.org/pdf/2107.13462.pdf). It is the total electricity demand at a half hourly granularity for the state of Victora in Australia from 2002 to the start of 2015. A more detailed description of the dataset can be found [here](https://rdrr.io/cran/tsibbledata/man/vic_elec.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "familiar-court",
   "metadata": {},
   "outputs": [],
   "source": [
    "url = \"https://raw.githubusercontent.com/tidyverts/tsibbledata/master/data-raw/vic_elec/VIC2015/demand.csv\"\n",
    "df = pd.read_csv(url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "respective-personal",
   "metadata": {},
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "conditional-polyester",
   "metadata": {},
   "source": [
    "The date are integers representing the number of days from an origin date. The origin date for this dataset is determined from [here](https://github.com/tidyverts/tsibbledata/blob/master/data-raw/vic_elec/vic_elec.R) and [here](https://robjhyndman.com/hyndsight/electrictsibbles/) and is \"1899-12-30\". The `Period` integers refer to 30 minute intervals in a 24 hour day, hence there are 48 for each day.\n",
    "\n",
    "\n",
    "\n",
    "Let's extract the date and date-time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "frequent-panama",
   "metadata": {},
   "outputs": [],
   "source": [
    "df[\"Date\"] = df[\"Date\"].apply(\n",
    "    lambda x: pd.Timestamp(\"1899-12-30\") + pd.Timedelta(x, unit=\"days\")\n",
    ")\n",
    "df[\"ds\"] = df[\"Date\"] + pd.to_timedelta((df[\"Period\"] - 1) * 30, unit=\"m\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "private-petersburg",
   "metadata": {},
   "source": [
    "We will be interested in `OperationalLessIndustrial` which is the electricity demand excluding the demand from certain high energy industrial users. We will resample the data to hourly and filter the data to the same time period as [original MSTL paper [1]](https://arxiv.org/pdf/2107.13462.pdf) which is the first 149 days of the year 2012."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fewer-dietary",
   "metadata": {},
   "outputs": [],
   "source": [
    "timeseries = df[[\"ds\", \"OperationalLessIndustrial\"]]\n",
    "timeseries.columns = [\n",
    "    \"ds\",\n",
    "    \"y\",\n",
    "]  # Rename to OperationalLessIndustrial to y for simplicity.\n",
    "\n",
    "# Filter for first 149 days of 2012.\n",
    "start_date = pd.to_datetime(\"2012-01-01\")\n",
    "end_date = start_date + pd.Timedelta(\"149D\")\n",
    "mask = (timeseries[\"ds\"] >= start_date) & (timeseries[\"ds\"] < end_date)\n",
    "timeseries = timeseries[mask]\n",
    "\n",
    "# Resample to hourly\n",
    "timeseries = timeseries.set_index(\"ds\").resample(\"h\").sum()\n",
    "timeseries.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "returning-commercial",
   "metadata": {},
   "source": [
    "### Decompose electricity demand using MSTL"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bd694778-e8c0-41ee-975f-2e86779f4d2a",
   "metadata": {},
   "source": [
    "Let's apply MSTL to this dataset.\n",
    "\n",
    "Note: `stl_kwargs` are set to give results close to [[1]](https://arxiv.org/pdf/2107.13462.pdf) which used R and therefore has a slightly different default settings for the underlying `STL` parameters. It would be rare to manually set `inner_iter` and `outer_iter` explicitly in practice."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "charged-masters",
   "metadata": {},
   "outputs": [],
   "source": [
    "mstl = MSTL(\n",
    "    timeseries[\"y\"],\n",
    "    periods=[24, 24 * 7],\n",
    "    iterate=3,\n",
    "    stl_kwargs={\"seasonal_deg\": 0, \"inner_iter\": 2, \"outer_iter\": 0},\n",
    ")\n",
    "res = mstl.fit()  # Use .fit() to perform and return the decomposition\n",
    "ax = res.plot()\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38e7f030-be63-465e-aa58-eed65beeaf40",
   "metadata": {},
   "source": [
    "The multiple seasonal components are stored as a pandas dataframe in the `seasonal` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb04409d-c671-4b73-b50b-142cb56539bc",
   "metadata": {},
   "outputs": [],
   "source": [
    "res.seasonal.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25977848-bea5-49fb-ae54-5a385c7ce2cd",
   "metadata": {},
   "source": [
    "Let's inspect the seasonal components in a bit more detail and look at the first few days and weeks to examine the daily and weekly seasonality. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b4205d1e-e389-439f-88ca-a3e9f86e215d",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(nrows=2, figsize=[10, 10])\n",
    "res.seasonal[\"seasonal_24\"].iloc[: 24 * 3].plot(ax=ax[0])\n",
    "ax[0].set_ylabel(\"seasonal_24\")\n",
    "ax[0].set_title(\"Daily seasonality\")\n",
    "\n",
    "res.seasonal[\"seasonal_168\"].iloc[: 24 * 7 * 3].plot(ax=ax[1])\n",
    "ax[1].set_ylabel(\"seasonal_168\")\n",
    "ax[1].set_title(\"Weekly seasonality\")\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fb31614-fd4c-476c-b8d2-cb7c390febb1",
   "metadata": {},
   "source": [
    "We can see that the daily seasonality of electricity demand is well captured. This is the first few days in January so during the summer months in Australia there is a peak in the afternoon most likely due to air conditioning use. \n",
    "\n",
    "For the weekly seasonality we can see that there is less usage during the weekends.\n",
    "\n",
    "One of the advantages of MSTL is that is allows us to capture seasonality which changes over time. So let's look at the seasonality during cooler months in May."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "580be278-9943-42a0-8a70-081d6e926652",
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(nrows=2, figsize=[10, 10])\n",
    "mask = res.seasonal.index.month == 5\n",
    "res.seasonal[mask][\"seasonal_24\"].iloc[: 24 * 3].plot(ax=ax[0])\n",
    "ax[0].set_ylabel(\"seasonal_24\")\n",
    "ax[0].set_title(\"Daily seasonality\")\n",
    "\n",
    "res.seasonal[mask][\"seasonal_168\"].iloc[: 24 * 7 * 3].plot(ax=ax[1])\n",
    "ax[1].set_ylabel(\"seasonal_168\")\n",
    "ax[1].set_title(\"Weekly seasonality\")\n",
    "\n",
    "plt.tight_layout()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1f94791-45d6-4059-9422-dc4da14332bc",
   "metadata": {},
   "source": [
    "Now we can see an additional peak in the evening! This could be related to heating and lighting now required in the evenings. So this makes sense. We see that main weekly pattern of lower demand over the weekends continue."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5785619-1724-40f9-99a7-c9c71fab64a5",
   "metadata": {},
   "source": [
    "The other components can also be extracted from the `trend` and `resid` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0271089-7c5c-438b-b060-62ab53f46974",
   "metadata": {},
   "outputs": [],
   "source": [
    "display(res.trend.head())  # trend component\n",
    "display(res.resid.head())  # residual component"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b6d817a-aa21-4e8e-b94c-4dc8e0d7bb89",
   "metadata": {},
   "source": [
    "And that's it! Using MSTL we can perform time series decompostion on a multi-seasonal time series!"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
